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Abstract 

(N 

A recurrence scheme is defined for the numerical determination of high de- 
gree polynomial approximations to functions as, for instance, inverse powers 
near zero. As an example, polynomials needed in the two-step multi-boson 
{NJ | (TSMB) algorithm for fermion simulations are considered. For the polyno- 

^ \ mials needed in TSMB a code in C is provided which is easily applicable to 

polynomial degrees of several thousands. 

» 

1 Introduction 

High quality polynomial approximations of inverse powers near zero are needed, for in- 
^ ■ stance, in multi-bosonic algorithms for numerical Monte Carlo simulations of fermionic 
quantum field theories [Tj. Least-square optimization offers an effective and flexible frame- 
work, in particular in two-step multi-boson algorithms 2j, where the functions to be ap- 
proximated are typically of the form x~ a P(x) with a > and some given polynomial 
P(x). The approximation has to be optimized in an interval covering the spectrum of a 
positive matrix, that is for x E [e, A] , < e < A. The polynomial P{x) can be just a 
constant or another cruder approximation of x~ a . 

The required precision of the approximation is rather high, it should be preferably 
near machine precision. Since the condition number A/e can be as high as 10 6 or more, 
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the required polynomial degrees can be in the multi thousand range. This is due to the 
fact that - with A = 0(1) — 0(10) - the lower limit is of the order e = (am/) 2 where arrif 
is the mass of the elementary fermion in lattice units. Interesting values for the fermion 
mass are, for instance in QCD or in supersymmetric Yang-Mills theory, arrif = 0(1O -3 ) 
or smaller. (For examples of recent TSMB applications with light fermions see [Sj and 
[H Ej ) Other possible applications of high degree least-square optimized polynomial 
approximations are, for instance, in fermionic simulation algorithms for improved actions 
jHllZl or the evaluation of the matrix elements of Neuberger's overlap fermion action [H|. 

Least-square optimized polynomials in the context of the TSMB algorithm have been 
considered previously in [01 Elj. (For a general introduction see, for instance, [llj.) In 
the present paper the recurrence scheme proposed in [TUj is worked out in detail and an 
optimized C code is provided which is suitable to deliver the required polynomials with 
an acceptable amount of computer time. 

The plan of this paper is as follows: in the next section some basic definitions are 
briefly summarized and the notations are introduced. In section El the recurrence scheme 
for the numerical determination of the least-square optimized polynomials is presented 
in detail. Section 0] is devoted to the explanation of the C code which can be used to 
perform the recurrence. 

2 Definitions 

In this section the basic definitions and notations are introduced which will be needed 
later. For more formulas and discussion see and especially where a rather complete 
collection of formulas is given (and a few misprints in [5| are also corrected). 

In the general case one wants to approximate the real function f(x) in the interval 
x G [e, A] by a polynomial P n {x) of degree n. The aim is to minimize the deviation norm 

S n =^N~l J"dxw(x) 2 [f(x)-P n (x)] 2 y . (1) 

Here w(x) is an arbitrary real weight function and the overall normalization factor N e x 
can be chosen by convenience, for instance, as 

N t , x = £dxw{xff{x) 2 . (2) 

For optimizing the relative deviation one takes a weight function w(x) = /(x) _1 . 

As discussed in detail in ref. |10j . obtaining the optimized polynomial P n {x) in case 
of x G [e, A] with small positive e and f(x) = x~ a /P(x) (a > and polynomial P(x)) is 
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equivalent to the inversion of an ill-conditioned matrix with a very large condition number. 
As a consequence, the problem can only be solved by using high precision arithmetics. 

Another place where rounding errors may be disturbing is the evaluation of the poly- 
nomials - once the polynomial coefficients of P n (x) are known. Fortunately, this second 
problem can be solved without the need of high precision arithmetics by applying a suit- 
able orthogonal expansion. In fact, after the optimized polynomials are obtained with high 
precision arithmetics, their evaluation can be performed even with 32-bit arithmetics if 
this orthogonal expansion is used. The orthogonal polynomials (// = 0, 1, 2, . . .) are 

defined in such a way that they satisfy 

J dxw(x) 2 $ li (x)$ t ,(x) = 6p,q 1/ . (3) 
The polynomial P n (x) can be expanded in terms of them according to 

n 

Pn{x) = d n»®v{x) ■ (4) 

Besides the normalization factor q u let us also introduce, for later purposes, the integrals 
p u and s u [y = 0, 1, . . .) by 



q v = J dxw(x) 2 § u (x) 2 , 

rX 

p„ = / dx w(x) 2 $ u (x) 2 x 



rX 

s v = / dxw{x) 2 x v . (5) 

It can be easily shown that the expansion coefficients d nv minimizing 6 n are indepen- 
dent of n and are given by 

d w = d v = — , (6) 

where 

rX 

\2 



dxw(x) f(x)<b v (x) . (7) 
The minimum of the deviation norm with normalization defined in (J2J is: 



f n b 2 1 ^ 



The orthogonal polynomials satisfy three-term recurrence relations The first two 
of them with /i = 0, 1 are given by 

$o(x) = l, $!(x)=x--. (9) 

so 
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The higher order polynomials $n(x) for /i = 2, 3, . . . can be obtained from the recurrence 
relation 

$ M+ i(z) = (x + /3 M )$ M (z) +7 M _i$ M _x(x) , (// = 1,2,...) , (10) 

where the recurrence coefficients are given by 

& = -- > 7,-i = — ^- • (11) 

3 Recurrence scheme 

A recurrence scheme is defined as a sequential algorithm for building up the system 
of orthogonal polynomials and determining the expansion coefficients of the optimized 
polynomial approximation starting from the function f(x) to be approximated and from 
the weight function w(x). The necessary ingredients are, in fact, the moments of the 
weight function s u defined in © and the moments of w(x) 2 f(x): 

rX 

t u = dxw{x) 2 f{x)x u , (^ = 0,1,...). (12) 



Since the calculations have to be performed in high precision arithmetics, these moments 
are needed to high precision which is easily reached if the integrals can be analytically 
performed. Otherwise the moments themselves have to be numerically calculated with 
high precision arithmetics. 

In case of the TSMB applications we typically have to consider the function f(x) = 
x~ a /P(x). In this case, if one chooses to optimize the relative deviation, the basic integrals 
defined in (JHJ) and (fT^j) are, respectively, 

2 2a+v 



dxP{x) x 

X dx P{x)x a+V . (13) 

These integrals can be simplified if, instead of choosing the weight factor for relative 
deviation w (x) 2 = P(x) 2 x 2a , one takes its square root: 

w(x) 2 = P(x)x a , (14) 

In practice this leads to similarly good approximations as the previous choice and the 
basic integrals become simpler: 

rA 



t u = I dxx v . (15) 
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One can devise recurrence schemes differently, for instance, based on the polynomial 
coefficients of the orthogonal polynomials. A good scheme can be built on the moments 



' '/if 



A 

dx w{x) 2 <&,,{x)x u 



(jl = 0, 1, . . . , n; v = n, ji + 1, . . . , 2n — \i , (16) 

using also the coefficients of the next-to-leading order term of the orthogonal polynomials. 
According to the recurrence relation in ([10)1 we have for /x = 1,2,... 

&»(x) =x^ + f^~ 1 + --- (17) 

where the next-to-leading order coefficient / M satisfies 

/i = --, U + i = f, + P, (/i = l,2,...) . (18) 

so 

The recurrence coefficients 7 M _i can be expressed from 

<? M = , p M = r M)M+ i + / M r w (19) 

and eq. (fTTj) as 

fl, = -/„ - ! ^ ±i , 7,-i = Tj ^~ ■ (20) 



^„ ' /X— 1,/t— 1 



It follows from the definition (fl7)|) that 

r 0u = / dxw[x) 2 x v = s u , 

ri„ = / dxw(x) 2 (x v+1 + f\x v ) = s u+ i + fis u = s u+ i-^-^ . (21) 

The recurrence relation (fTTH) for the orthogonal polynomials implies 

= r^+i + /5 M r^ + 7 M -ir M _i^ . (22) 

This completes a closed recurrence scheme for calculating (3^ and 7^-1 which can then be 
used to determine the orthogonal polynomials by (jlOj) . 

In order to see how this recurrence scheme works let us display a few steps explicitly. 
Let us assume that we want to obtain the orthogonal polynomials up to degree n and we 
know already the basic integrals s v for v — 0, 1, . . . , 2n. According to (|21j) we then have 

r u = s u (u = 0, 1, . . . ,2n - 1) , r lu = s u+l - ^-^ {v = 1, 2, . . . , 2n - 1) . (23) 

so 
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Using fllEJ) and (jUJ) we obtain 

A = , Pi = ~fi , 7o = • 24 

Now $2% gives 

T2u = r 1;V+1 + (5 x r lv + 7 ri^ {v = 2, 3, . . . , 2n - 2) . (25) 
This can be used for calculating 

/ 2 = A + /i, fa = -h-— , 7i = -—, (26) 
and so on. At the end there is 

r n -l,u = r n - 2 M+l + (3n-2r n -2 ) u + 7n-3?"n-3^ {v = U ~ 1, 71, 71 + 1) (27) 

which gives 

/n-l = Ai-2 + /n-2 , Ai-1 = ~fn-l 1 , ln-2 = ] ~ • (28) 

Finally one calculates for q n 

Tnn '"u- l,n+l 

+ ^ n -l^n-l,n + 7n-2?"n-2,n • (29) 

After determining the orthogonal polynomials one can also perform a similar recur- 
rence for the expansion coefficients of the least-square optimized polynomials given in (JHJ). 
Now one assumes the knowledge of the moments t u in (jl2j) and considers 

b^u = J dxw(x) 2 f(x)<&^x)x v , 

H = 0, 1, . . . , n; f = 0, 1, . . . , n — \i . (30) 
From this one obtains b v = b u0 . For starting the iteration one has 

bou = tv , b\ v = t u+ i + f\t v , (31) 

and then one uses 

(32) 

The basic recurrence relation in (jlOjl can also be used for the evaluation of integrals 
of least-square optimized polynomials. Examples are t u in (fT^j) and s u in (jUj) if P(x) is 
expanded in the corresponding orthogonal basis. 
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Runtime performance for a basic computation of polynomials 
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Figure 1: Comparison of the runtime performance when calculating 
the first three polynomials P^\ P^ and 



4 C code for polynomials needed in TSMB 

The implementation of the presented recurrence scheme has to be done with a sufficiently 
high numerical precision. We have found that the required number of digits increases 
linearly with the polynomial order parameter. Since this polynomial degree can be in the 
multi thousand range one has to carefully choose a way to implement this. In particular 
it is obvious that the standard 32-bit or 64-bit precision, corresponding to roughly (9(10) 
digits of precision, is insufficient for this task. 

Possible environments are computer algebra systems like Maple or more low-level 
computer programming languages like C or Fortran combined with a suitable library for 
multiprecision arithmetics. Since runtime performance is important we have opted for 
the programming language C. 

There are several freely available libraries for multiprecision arithmetics [T2" | I13 | IHj. 
These libraries vary in usability and performance. For very large numbers, i. e. more 
than 12000 decimal digits, these libraries use the Schonhage-Strassen multiplication, an 
asymptotically optimal algorithm. However, most of the time we need polynomial orders 
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of 0(1000), for which less than 5000 digits are needed. Therefore other algorithms like the 
0(n 2 ) standard algorithm for smaller numbers or the 0(n ls ) Karatsuba multiplication 
for intermediate large numbers should be used where applicable. This is done by the 
gmp and the CLN libraries, where the latter comes with easy to use functions for the 
calculation of sine, cosine and logarithm. Furthermore it can be configured to use the 
low-level routines of gmp, so it can be considered as a nice interface for that library. A 
comparison of different environments and libraries is shown in figure H The polynomial 
degree was fixed for this performance test, while only the number of used digits was 
changed. Runtime measurements are for the computer algebra system Maple, and for 
the programming language C using the libraries hfloat and CLN. In this example the 
library CLN was used without support for the faster low-level routines of gmp. When 
interpreting the results of this comparison it has to be kept in mind that no guarantee for 
a similar level of optimization for the different programs can be given (although it should 
be roughly true). Let us also note that in Maple one could switch the precision to lower 
values when sufficient, while this would not be so easy for the C libraries. This would 
allow to gain in performance for the Maple program but it would make a comparison with 
the other programs complicated. 

The source code for the program that uses the CLN library is quadropt.C. It can be 
compiled with the g++ compiler 1 and it has to be linked to the previously installed library 
CLN. The executable needs one parameter as an argument, a filename. This file is usually 
quadropt. dat, from which the options like polynomial degrees and interval are read. These 
options will be explained in the next subsections. The parameter file quadropt.dat is 
organized to have a line of comments and then a line of corresponding arguments. This 
structure is repeated for all arguments (see figure |2J). 

4.1 The polynomial 

The polynomial is supposed to be a polynomial approximation to the function fix) = 
x~ a , i. e. P(x) = 1. As weight function u(x) 2 the squared inverse of f(x) is taken. The 
parameter a is given by the first value in the second line of arguments (i. e. the fourth 
line altogether) of the parameter file. In case of the TSMB applications this is a = -y-, 
where Nf is originally the number of fermionic flavours. However, Nf does not have to be 
integer valued, for instance, in the case of Majorana fermions or if determinant breakup 

lr To be precise, the program indeed has to be compiled with a C++ compiler, because it uses C++ 
features like streams. The structure of the program is, however, very C-like, i. e. it is based on usual 
function calls instead of calls to an objects member function. 
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Polynomial Order parameters (5+) : 

16 60 10 90 20 30 70 90 
Alpha, Epsilon, Lambda 
1.0 8.0e-3 4.0 

W3(nom. , denom.), MaxNeubergerTerms (<=0 -> P12-method) , #P5-Newton iterations 
2 3 -11 

Desired Precision (-l=default, +X=def ault+X) , RootsPrecision (-l=default, etc.) 

-1 -1 
Used data-subdirectory 

Data/Qcd2/Cn500 . 16 . 60 

Figure 2: Structure of the parameter file quadropt.dat. 

is applied. In general any real value of a will work with the code. However, the necessary 
number of digits for a precise determination of the recurrence coefficients might differ if 
a is drastically different from those in the applications of the TSMB algorithm. 

In the first line of arguments the requested degrees for the different polynomials have 
to be given. The first value ri\ is for the polynomial degree of Although at least 

five numbers have to be specified it is possible to set the other values to zero. In this way 
only the first polynomial is calculated. 

Another important input is the approximation interval [e, A] . The borders of this 
interval have to be given in the second line of arguments after a. 

We have found that the required number of digits for the calculation of the necessary 
integrals and coefficients does only depend on the polynomial order. For the first poly- 
nomial it is sufficient to set the number of digits to (40 + 1.6ni). One can further 
benefit by recognizing that some intermediate integrals need to be calculated with a lower 
precision of roughly (30 + 0.9ni). Furthermore, the roots can be calculated with an even 
lower precison of roughly (30 + 0.5ni). The situation might be different if the parameters 
a, e and A are drastically different from those used in TSMB applications. 

In the C program as presented here the value of the number of digits is chosen according 
to the above rules. We do not exploit the possibility to use less digits at places where it 
would be in principle possible. The program might even use more digits if this is required 
by another calculated polynomial. Usually the necessary number of digits is determined 
by the fourth or fifth polynomial order. This is a drawback of the implementation in C 
where changing the number of digits is not really supported by the CLN library. In Maple 
one can set the number of digits as appropriate for each integral. Our C program uses 
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a single fixed precision for all computations. The only exception is the ordering of the 
polynomial roots according to ref. For this the calculation of a logarithm is needed 
roughly 0(n 3 ) times. Since the computation of logarithms scales much worse than a 
simple multiplication, the ordering of the first polynomial roots could consume more than 
90% of the runtime already for modest values of n\. Therefore the ordering is calculated 
by an appropriate lower precision, which is sufficient for this purpose. 

The default choice of the number of digits can be overruled by changing the two values 
in the fourth line of arguments from their default values of —1. With the first value the 
number of digits for the integrals can be specified. The second value gives the number 
of digits for the ordering of roots. If some value is set, then this is taken as the number 
of digits. If, however, this number starts with a + character, e. g. as in +10, then that 
number is added to the default number of digits. While —1 leads to the default number 
of digits, values smaller than —2 reduce the number of digits from their default value. 
A value of —2 as the second value sets the number of digits for the roots equal to the 
number of digits for the other integrals (this is highly not recommended). 

Of course one can compute the polynomials at a much higher precision than needed, 
e. g. by giving +150 as a first value for the number of digits. In this way one can be sure 
that the result is good enough. A better way is to compute the polynomials twice, once 
with the default precision ( — 1), and once with a slightly better precision (+10). One can 
then compare the results which is printed after conversion to the internal 64-bit floating 
point datatype. The results should agree. 2 

The output is written to files in the subdirectory specified by the last line of arguments. 
The file for the roots is named starting with Coef, and first in this file there are the 
coefficients of the polynomial and then the real and imaginary parts of the roots. The 
file for the recurrence coefficients of this first polynomial is named starting with Cort. If 
further polynomials are calculated there will be several files starting like that. Among 
these files the one for the first polynomial is the one not having a -4 nor a -5 near the end 
of the filename. In this file (n + 1) coefficients d u are given first, then n values of (3^ and 
finally (n — 1) values of 7^. 

2 The ordering of the roots might change slightly. This is due to the fact that different orderings fullfil 
the same optimal ordering condition and it is just by chance which one of them is printed. However, the 
orderings should only differ by flips of root pairs with the same absolute value. 
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4.2 The polynomials P^ and P^ 

In TSMB applications PW is only a crude approximation to f(x) = x~ a , i. e. the poly- 
nomial order n\ is quite small. This is corrected by a polynomial of higher degree 
ri2, which is defined such that pWp( 2 ) « /(a;) to a better precision. This means that we 
set P = P^ and repeat the calculation as for P^\ 

In a reweighting step the remaining deviation of pWp( 2 ) can be further corrected by 
a fourth polynomial P$, defined similarly, which is computed by setting P = p( x )p( 2 ). 
The polynomial degrees for Pf® and P^f are given as the second and fourth argument in 
the first line of arguments. The weight function is set to the inverse itself. 

While the required number of digits for the first polynomial scales with a relatively 
small slope according to (40 + 1.6ni), the corresponding slope is larger for the second and 
fourth polynomials because of the nested integrals introduced by setting P(x) to more 
complicated functions. In our cases (70 + 2.2n 2j 4) gives a good estimate for the necessary 
number of digits. 

The recurrence coefficients d u , (3^ and 7 M for the second polynomial are appended to 
the output file containing the recurrence coefficients of the first polynomial. The ones for 
the fourth polynomial are written to a separate file with a name ending with -4. C++. 

4.3 The polynomial P^ 

For the TSMB algorithm, besides the before mentioned polynomials, one also needs an 
approximation Pff of the inverse square root of P®. This again is given by a polynomial 
labelled as the third one: 

Pg™(P$Y h . (33) 



n 3 \^ n 2 

Since any error made in this polynomial approximation is uncorrectable in the TSMB 

updating algorithm, it is important that this approximation is good, especially at the lower 

end of the interval. To improve that part this polynomial is optimized for the interval 

[e',A], with e' = ^je. Furthermore one can adjust the weight function to w(x) 2 = x~ UB . 

The polynomial degree for Pf$> is read from the third argument in the second line of 

arguments, nominator and denominator of uj% are given by the first two integer numbers 

in the third line of arguments. 

The calculation of the third polynomial can be based on Neuberger's formula ^H] • The 

third value in the third line of arguments gives the number of terms taken in Neuberger's 

formula. For K terms this polynomial is defined as: 

1 K 1 

p(3) ~ _ V - (M) 

K^P(2)cos^(,-|) + sin 2 ^( S -I) ' 1 ] 
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For large values of K the computation using becomes slow. In this case one can 
use another method to obtain a good polynomial P® faster. This is implemented by a 
series of fifth polynomials Pfg which are approxiamtions to a good with increasing 
precision. The same weight function as for the third polynomial is used. The members of 
this series are calculated iteratively. Each polynomial in the series may have a different 
polynomial degree, typically in a non- decreasing sequence. With this method it is also 
possible to further reduce the systematic error at some fixed polynomial order because 
the program uses a smaller lower bound for the approximation interval e" = -^e. The 
polynomial degrees {n§ ) of the P$ sequence have to be given one after the other as the 
last arguments in the first line of arguments. 

The starting point for this iteration is some known approximation to the third polyno- 

(3) (3) 

mial PL/ . From P}J better approxiamtions can be obtained by the following Newton-type 
iteration: 

P & = ^( P W+J^) * = 0,1 )2) .... (35) 

The implementation of this iteration calculates the term (3* (2) in the same way as the 

(fc) 

fourth polynomial is calculated, i. e. there is a polynomial approximation there. 

After a few iterations, typically two or three, the quality of PA,n (also called "fifth 
polynomial" ) loses any dependence on the starting point of the iteration. Therefore it is 
also possible to replace the costly computation based on by some cruder approxima- 
tion. One possibility is to take for P$ a second polynomial p( 2 )' for a first polynomial of 
pW' , which approximates f'(x) = with the order n\. Although this is not sufficient 
for using it elsewhere, it is very useful as a cheap starting point for the Newton iteration. 
This method is applied by the program if the specified number of terms for Neuberger's 
formula is negative. 

Present experience shows that using this crude estimate as a starting point is sufficient. 
One should then first create two or three fifth polynomials of lower order than the desired 
order for the final P^ 5 \ At the end one can increase the polynomial orders by larger 
steps. Usually a good approximation is achieved if the last polynomial order of Pfg is 
roughly 20% larger than n-i. After the computation is done, the parameters of each fifth 
polynomial are stored in a seperate file. 
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